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A  Global  Convergence  Theory  for  the  Celis-Dennis- Tapia 
Trust  Region  Algorithm  for  Constrained  Optimization1 


by 


Mahmoud  El-Alem2 


Abstract.  A  global  convergence  theory  for  a  class  of  trust-region  algorithms 
for  solving  the  equality  constrained  optimization  problem  is  presented.  This 
theory  is  sufficiently  general  that  it  holds  for  any  algorithm  that  generates  steps 
that  give  at  least  a  fraction  of  Cauchy  decrease  in  the  quadratic  model  of  the  con¬ 
straints  and  uses  the  augmented  Lagrangian  as  a  merit  function.  This  theory  is 
used  to  establish  global  convergence  of  the  1984  Celis-Dennis-Tapia  algorithm 
with  a  different  scheme  for  updating  the  penalty  parameter.  The  behavior  of  the 
penalty  parameter  is  also  discussed. 
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1.  Introduction 

In  this  research,  we  consider  the  following  equality  constrained  optimization 
problem 

minimize  /(*)  ,  (EQ) 

subject  to  hi(x)=0  i=l,...,m 

where  /  and  h{  are  assumed  to  be  smooth  nonlinear  functions  defined  from 
Rn  into  R  .  A  more  detailed  list  of  assumptions  will  be  explicitly  presented 
later.  We  will  denote  by  h(x)  the  vector  whose  components  are 
hi(x)  .  It  is  convenient  to  introduce  the  Lagrangian  function 

/  :  Rn  X  Rm  -*■  R  associated  with  problem  (EQ).  It  is  the  function: 

l(x,  X)  =  f  (x)  +  \T  h(x)  (1.1) 

where  X  =  (X1(  .  .  .  ,\m)T  is  the  Lagrange  multiplier  vector.  The  augmented 
Lagrangian  function  4> :  Rn  x  Rm  X  R  -*■/?  associated  with  problem  (EQ)  is 
the  function: 

4>(i,X;r)  =  /(j,  X)  +  r  1 1  /»(*)  1 1|  .  (1.2) 

where  r  is  the  penalty  parameter. 

Stating  necessary  optimality  conditions  in  terms  of  the  Lagrangian  function 
requires  a  constraint  qualification.  A  satisfactory  but  somewhat  restrictive  con¬ 
straint  qualification  is  the  regularity  assumption:  that  is,  the  vectors 
VM*)  are  linearly  independent  at  the  solution.  We  use  the  notation 

VA(ar)  for  the  matrix  whose  columns  are  V6,(x)  . 

The  first-order  necessary  conditions,  or  Kuhn-Tucker  conditions,  for  a  point 
x*  €  R  to  be  a  solution  of  problem  (EQ)  are  that  x *  be  a  feasible  point  (  i.e. 
h  (x*)  =  0  ),  and  that  there  exists  a  Lagrange  multiplier  X*  such  that 
Vxl(x*,\t)  =  0  .  Equivalent  first-order  necessary  conditions  are  that  x *  be  a 
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feasible  point  and  that  P(x *)  V/(x*)  =  0  where  P(x)  is  the  projection  onto  the 
null  space  of  Vh(x)T  ,  i.e.  , 

P(x)  =  I  -  Vh{x){Vh{x)TVh{x))~lVh{x)T  .  (1.3) 

For  a  detailed  discussion  of  optimality  conditions,  see,  for  example,  Fiacco  and 
McCormick  (1968). 

Problem  (EQ)  is  often  solved  by  the  Successive  Quadratic  Programming 
(SQP)  algorithm.  Namely,  at  the  k ^  iteration,  the  step  is  computed  by  solving 
the  following  quadratic  programming  subproblem: 

minimize  Vxl(xk,\k)T  s  +  —  sT Bk  s  (QP) 

2 

subject  to  h(xk)  +  Vh(xk)T s  =  0, 

where  Bk  is  the  Hessian  of  the  Lagrangian  at  (xk,\k)  or  an  approximation  to  it. 

The  local  convergence  analysis  for  the  SQP  algorithm  has  been  well  esta¬ 
blished  [for  example  see  Tapia  (1977), (1978)].  The  area  of  global  convergence  is 
currently  receiving  much  attention. 

Trust  region  approaches  for  unconstrained  optimization  have  proven  to  be 
very  successful  both  theoretically  and  practically.  The  most  natural  way  to  intro¬ 
duce  the  trust  region  idea  into  constrained  optimization  is  to  add  a  constraint 
which  restricts  the  size  of  the  step  in  problem  (QP).  That  is,  at  the  aA  iteration 
we  solve  the  following  trust-region  quadratic  programming  subproblem: 


minimize  Vxl(xk,\k)T  s  -I-  —  sT Bk  s 


subject  to  h{xk)  +  Vh(xk)T s  =  0 

I  I  5  I  1 2  5;  At  • 


(TRQP) 


However,  this  approach  may  lead  to  inconsistent  constraints  if  h(xk)  ^  0.  To 
overcome  this  difficulty,  two  main  approaches  have  been  introduced.  The  first 
approach  is  to  relax  the  constraints  by  considering  the  following  subproblem: 
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minimize  Vxl(xk,\k)T  s  +  —  sT Bk  s 

2 

subject  to  a  h(xk)  +  Vh(xk)T s  =  0 

It  *  ll2  <  A*  , 


where  0  <  a  <  1  .  This  approach  was  first  introduced  by  Vardi  (1985).  It  was 
also  used  by  Byrd,  Schnabel,  and  Shultz  (1987).  This  approach  always  leads  to  a 
feasible  subproblem  if  a  is  chosen  properly.  However,  this  approach  suffers  from 
the  disadvantage  that  the  step  depends  on  the  unknown  parameter  a  for  which 
there  is  no  clear  way  of  choosing. 

The  second  approach  is  to  add  the  trust-region  constraint  to  a  somewhat 
different  problem.  At  the  k ^  iteration  the  step  is  taken  to  be  the  one  that 
minimizes  the  quadratic  model  of  the  Lagrangian  subject  to  some  required 
decrease  in  ||  h(xk)  +  Vh(xk)T  s  ||2.  This  idea  was  first  introduced  by  Celis, 
Dennis,  and  Tapia  (1985).  At  each  iteration  the  step  is  computed  by  solving  the 
following  subproblem: 


minimize  Vxl(xk,\k)T  s  +  —  sT Bk  s 

2 

subject  to  \\  h(xk) +  Vh(xk)Ts  \\2  <  ek  (CDT) 

I  I  5  I  1 2  5;  A  k 

where  9k  and  At  are  positive  constants. 

Celis,  Dennis  and  Tapia  (1985)  chose  dk  to  be  ||  h(xk)  +  Vh(xk)Tskp  1 12  , 
where  skp  =  -  ak  Vh(xk)  h(xk)  is  the  step  to  the  Cauchy  point,  i.e. ,  the  minim- 
izer  in  the  trust  region  {s  :  1 1  s  1 12  <  A*}  of  1 1  h{xk)  +  Vh{xk)T s  \  |2  along  its 
negative  gradient  direction.  That  is,  the  Celis-Dennis-Tapia  step  is  chosen  from 
the  set  of  steps  from  xk  that  are  inside  the  trust  region  and  give  at  least  as  much 
descent  on  the  2-norm  of  the  residual  of  the  linearized  constraints  as  does  the 
Cauchy  step. 

In  1986,  Powell  and  Yuan  introduced  another  way  of  choosing  9k.  They 
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chose  it  to  be  any  number  that  satisfies 

h  =  min  [  ||  h(xk)  -I-  Vh{xk)Ts  1 12  :  1 1  a  1 12  <  crAk  J  , 
for  some  0  <  a  <  1  .  [See  Powell  and  Yuan  (1986-b)] 

A  more  general  choice  of  9k  was  suggested  by  Celis,  Dennis,  Martinez, 
Tapia,  and  Williamson  (1989).  They  chose  it  to  be 

h  =  (1  -r)  II  h(xk)  ||2  +  r  ||  h{xk)  +  Vh{xk)T s£p  |  |2  ,  (1.4) 

for  some  0  <  r  <  1  .  Where  skp  is  the  step  to  the  Cauchy  point  and  is  defined 
above. 

This  latter  choice  of  6k  enforces  a  fraction  of  Cauchy  decrease  on  the  2- 
norm  of  the  linearized  constraints  in  the  CDT  subproblem.  Powell  and  Yuan’s 
choice  of  9k  enforces  a  fraction  of  optimal  decrease  [see  Celis,  Dennis,  Martinez, 
Tapia,  and  Williamson  (1989)].  The  choice  given  by  Celis,  Dennis  and  Tapia 
(1985)  gives  at  least  as  much  decrease  in  the  2-norm  of  the  linearized  constraints 
as  does  the  Cauchy  step  skv .  Other  choices  of  9k  are  suggested  in  Celis,  Dennis, 
Martinez,  Tapia,  and  Williamson  (1989).  We  are  going  to  consider  only  the  choice 
of  9k  given  by  (1.4).  This  choice  is  appropriate  since  it  insures  considerable  free¬ 
dom  in  the  subproblem  feasible  set,  allowing  the  minimization  of  the  subproblem 
objective  function  to  pull  the  iterate  toward  the  optimal  point  for  problem  (EQ) 
rather  than  progressing  too  fast  toward  nonlinear  feasibility  at  the  expense  of 
optimality.  Our  numerical  experiments  reinforce  the  validity  of  this  choice.  [See 
Dennis,  El-Alem,  and  Tapia  (1989)] 

In  this  paper  we  consider  a  trust-region  algorithm  for  solving  the  equality 
constrained  optimization  problem.  This  algorithm  is  a  variant  of  the  1984  Celis- 
Dennis-Tapia  trust-region  algorithm  in  that  it  uses  a  different  scheme  for  updat¬ 
ing  the  penalty  parameter. 
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The  remainder  of  this  paper  is  organized  as  follows.  In  Section  2,  we  describe 
in  detail  the  trust-region  subproblem  that  will  be  considered  and  the  way  of  com¬ 
puting  the  trial  steps.  A  scheme  for  updating  the  radius  of  the  trust  region  is 
presented  together  with  a  discussion  about  the  criteria  for  accepting  or  rejecting 
the  trial  steps.  A  scheme  for  updating  the  penalty  parameter  is  also  presented. 
In  Section  3,  we  present  the  algorithm.  In  Section  4,  we  state  the  standard 
assumptions  under  which  our  global  convergence  theory  is  established.  In  Section 
5,  we  state  our  main  global  convergence  results.  Sections  6,7  and  8  are  devoted  to 
the  analysis  of  the  global  behavior  of  our  algorithm.  Section  9  contains  conclud¬ 
ing  remarks. 

Notation: 

The  trial  step  at  the  iteration  is  denoted  by  sk  and  its  associated 

Lagrange  multiplier  by  A\k  .  If  the  step  is  accepted  it  will  be  denoted  by  sk 
and  its  associated  Lagrange  multiplier  by  AXt  . 

The  decomposition  of  the  step  sk  into  a  tangential  and  a  normal  component 
is  considered.  These  components  are  denoted  by  sk  and  sk  respectively  and  are 
defined  by  sk  =  P(xk)  sk  and  sk  =  Q{xk)  sk  ,  where 

p{xk)  =  1  ~ 'Vh{xk)(Vh{xk)T'Vh(xk))-lVh(xk)T  and  Q(xk)  =  I  -  P(xk)  . 

The  expressions  V2h(xk)  AX  and  V~h{xk)  h(xk)  are  used  to  denote 

™  m 

£  ^hi(xk)  and  £  h{ (xk)  respectively.  The  matrix  Bk  denotes 

> - i  i=i 

V2/( xk,\k)  or  an  approximation  to  it. 

Subscripted  values  of  functions  denote  evaluation  at  a  particular  point.  For 
example  fk  means  f  {xk)  . 
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2.  Description  of  The  Algorithm 

The  algorithm  is  iterative.  At  each  iteration  a  trial  step  sk  is  obtained  by 
solving  a  model  problem.  At  the  k ^  iteration,  we  try  to  update  the  estimate  of 
the  solution  xk  to  an  improved  estimate  xk+1  .  To  do  this,  the  step  s$p  and 
the  multiplier  AX*^P  are  obtained  by  solving  the  QP  subproblem  (see  Section  l). 
If  they  exist  and  if  s$p  lies  inside  the  trust  region,  i.e.  if  ||  s$p  \  |  <  A*  ,  then 
we  set  sk  =  s$p  and  AX*  =  AX*^P  .  Otherwise,  the  CDT  subproblem  will  be 
solved  (see  Section  1).  On  the  other  hand,  if  xk  is  feasible,  then  we  solve  the 
TRQP  subproblem  (see  Section  1).  This  can  be  summarized  in  the  following 
scheme: 

SCHEME  2.1  Computing  the  Trial  Step 

Solve  (QP)  to  get  sfp  and  AX*2p  (see  Section  1) 

If  II  I  1 2  <  A, 

then  s*  =  s^p 

AX*  =  A\?p  . 

Else,  if  xk  is  feasible 

then  solve  (TRQP)  (see  Section  1) 

Set  sk  =  skTRQp 

AX,  =  -  (  V*,rVA,  )-‘VA,r(  V,/,  +  ]  . 

Else,  solve  (CDT)  (see  Section  l) 

Set  sk  =  skDT 

AX,  =  -  (  V/./VA,  V,lt  +  B„^dt  ]  . 


8 


When  xk  is  feasible,  then  $k  =  0  ,  and  the  CDT  subproblem  reduces  to  the 
TRQP  subproblem.  This  shows  a  continuity  in  the  behavior  of  xk  on  switching 
between  the  CDT  and  the  TRQP  subproblems. 

If  the  trial  step  is  either  the  TRQP  step  or  the  CDT  step,  our  choice  of  the 
multiplier  requires  solving  the  following  linear  system  in  the  least-squares  sense 

Vhk  AX*  =  —  (  V/*  +  Bk  s*  )  . 

Powell  and  Yuan  (1986-a  and  1986-b)  have  used  as  a  multiplier  update  for¬ 
mula  the  least-square  multiplier  estimate.  Using  this  formula,  the  following  linear 
system  has  to  be  solved  in  the  least-squares  sense 

VA(x*-fs*)  AX*  =  —  (  V/(z*+s*)  —  VA(x*+s*)  X*  )  . 

Powell  and  Yuan’s  choice  of  AX*  is  more  expensive  since  it  requires  a  factor¬ 
ization  of  VA(a:*4-5*)  at  each  trial  step.  Our  choice  of  AX*  requires  the  factori¬ 
zation  only  when  the  algorithm  moves  to  a  new  point  after  finding  an  acceptable 
step.  On  the  other  hand,  when  the  SQP  step  is  taken,  our  multiplier  is  obtained 
with  no  extra  cost  because  it  is  the  SQP  multiplier. 

Let  s*  be  the  step  computed  by  the  algorithm  and  AX*  be  the  correspond¬ 
ing  Lagrange  multiplier  step,  we  test  whether  the  point  (  xk+sk  ,  X*+AX*  )  is  a 
better  approximation  to  the  solution  (  x*  ,  X*  ).  In  order  to  do  this,  we  use,  as  a 
merit  function,  the  augmented  Lagrangian  (1.2). 

The  actual  reduction  in  the  merit  function  in  going  from  (x*,X*)  to 
(  xk+h  ,  )  is  given  by 

Aredk  =  4>(x*,X*;r*)  —  $  (x*+s*,X*+AX*;r*)  . 

We  can  write 

Aredk  =  l(xk,\k)  —  /(x*+s*,X*)  —  AX*r  h(xk+sk ) 

+  rk  [  IlM**)  Ilf  -  II  M**+«*)  Ilf  1-  (2.i) 
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The  calculation  of  the  step  sk  is  based  on  a  quadratic  approximation  of  the 
Lagrangian  function  and  a  linear  approximation  to  the  constraints.  Now  by  using 
the  same  approximations,  we  can  compute  the  predicted  reduction 

Predk  =  $  (xk,\k;rk)  -  V  {xk,sk\k,A\k,rk)  , 

where  ^>(xk,sk,\k,A\k-,rk)  is  an  approximation  to  ^ifc+s^X^-i-AXj.jr*)  and  is 
defined  by 

^{xk,sk,\k,S\k]rk)  =  l{xk,\k)  +  Vxl{xk,\k)T  sk  +  ^skTBksk 
+  AX/[  h(xk)+Vh(xk)T  sk  } 

+  rk  1 1  hixk)  +  Vh(xk)Tsk  1 ||  . 

Hence, 

Predk  =  -  Vxl(xk,\k)Tsk  -  y  skT Bk  sk  -  AX/(  h(xk)+Vh(xk)T sk  ) 

+  rk  [  II  h{xk)  Ml  -  ||  h{xk)  +  Vh{xk)Tsk  Ml],  (2.2) 


We  accept  the  step  and  set  xk+1  =  xk+sk  and  \k+l  =  \k+A\k  ,  if 
Aredk 


Prtdv 


>  ril  where  c  (0,1)  is  a  small  fixed  constant. 


If  the  step  is  rejected,  then  we  set  xk+l  =  xk  and  Xi+1  =  \k  and  decrease 


the  radius  of  the  trust  region  by  picking  At+1  e  [  1 1  sk  1 12  ,  o2  1 1  h  1 1 2  ]  > 

where  0  <  oq  <  a2  <  1  .  [See  Dennis  and  Schnabel  (1983)]. 


If  the  step  is  accepted,  then  the  trust-region  radius  is  updated  by  comparing 

A  redk 

the  value  of  Aredk  with  Predk  .  Namely,  if  r]l  <  -  <  rj2  where 

Predk 

rj2  €  (»7i,l)  ,  then  the  radius  of  the  trust  region  is  updated  by  the  rule: 


A*+1  =  min  [  At  ,  ar3  |  |  5*.  1 12  ]  where  a3  >  1  .  However,  if 


Aredk 

Predk 


>  *?2  » 


then  we  increase  the  radius  of  the  trust  region  by  setting 
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Afc+i  =  min[  A,  >  max  (  A*  ,  a 3  1 1  sk  1 12  )  ]  ,  where  A*  is  a  positive  constant. 
This  can  be  summarized  in  the  following  scheme: 


SCHEME  2.2  Testing  the  Step  and  Updating  the  Trust  Region  Radius 


Aredk 

Predk 


<  Vi  , 


then  set  a;jt+1  =  xk  , 

^i+l  =  ^ k  > 

A*+i  e  [  1 1  h  1 1 2  ,  do  1 1  sk  ||  2  ]  . 


(2.3) 


Else,  if  nx  <  —L  <  r,2 

Prcdfc 

then  set  xt+1  =  xk  +  sk  , 

^k  + i  —  +  AX*  , 

Ajb+i  =  min  [A*  ,  ar3  1 1  1 12  ]  . 


Else,  set  xk+1  =  xk  +  sk, 

^*+i  =  +  AX*  , 

Afc+i  =  min[  A,  ,  max(  A*  ,  a3  1 1  sk  \  |2  )  ]  . 

Now,  we  describe  our  strategy  for  updating  the  penalty  parameter  r  . 
Numerical  experiments  have  suggested  that  efficient  performance  of  the  algorithm 
is  linked  to  keeping  the  penalty  parameter  as  small  as  possible.  Our  global  con¬ 
vergence  theory  requires  that  the  sequence  {  rk  }  be  nondecreasing  and  that  the 
predicted  reduction  in  the  merit  function  at  each  iteration  be  at  least  as  much  as 
a  fraction  of  Cauchy  decrease  in  the  2-norm  of  the  residual  of  the  linearized  con¬ 
straints.  The  idea  now  is  to  keep  the  penalty  parameter  as  small  as  possible,  sub¬ 
ject  to  satisfying  these  two  conditions  needed  for  our  convergence  theory.  Hence, 
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our  strategy  will  be  to  start  with  r  =  1  and  increase  it  only  when  necessary  to 
satisfy  these  two  conditions.  The  following  is  the  scheme  that  we  use  for  updat¬ 
ing  the  penalty  parameter. 

SCHEME  2.3  Updating  the  Penalty  Parameter 

If 

Prcdt  >  i±(  ||  *,  III-  ||  Aj+VA/i,  |||]  , 

then  set  rk  =  rk_k  . 

Else,  set 

0  4r h  +  ^  s? Bk h  +  &^k{hk  +  ^hk  sk) 

II  A,  III-  l|A4+VA,rS4|||  '■ 

where  p  >  0  is  a  small  fixed  constant. 

The  initial  choice  of  the  penalty  parameter  r  =  1  is  scale  dependent.  Here 
we  assume  that  the  constraints  have  been  made  to  be  well  scaled  compared  to  the 
objective  function. 

Finally,  we  discuss  the  strategy  for  updating  the  matrix  Bk  .  If  sk  is  not  an 
acceptable  step,  then  set  Bk+l  =  Bk  ,  otherwise  compute 
Bk+ !  =  V2/fc+1  +  V2gk+l  \i+1  if  the  exact  Hessian  is  used,  or  update  Bk  by 
some  other  update  formula  that  satisfies  the  standard  assumption  5.  (See  Section 

4) 
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3.  The  Algorithm 

The  following  represents  the  outline  of  the  Algorithm.  It  differs  from  the 
1984  Celis-Dennis-Tapia  algorithm  in  its  way  of  updating  the  penalty  parameter 
in  step  3  of  the  algorithm  and  in  its  way  of  updating  the  trust  region  radius  in 
step  4. 

Step  0  : 

Set  x0  e  Rn  ,  B0  e  RnXn  ,  X0  e  Rm  ,  r_x  =  1  ,  p  >  0  , 

0  <  <*!  <  a2  <  1  <  a3  ,  0  <  77j  <  r?2  <  1  , 

0  <  r  <  1  ,  e>0,  Ao>0,  and  k  =  0  . 

Step  1  : 

If 

II  Pkvfk  \  U  +  II  h  ||2  <  e,  (3.i) 

where  Pk  is  defined  by  (1.3),  stop. 

Step  2  : 

Compute  sk  and  AXt  according  to  Scheme  2.1  above. 

Step  3  : 

Update  the  penalty  parameter  according  to  Scheme  2.3  above. 

Step  4  : 

Test  the  step  and  update  Ak  according  to  Scheme  2.2  above. 


Step  5  : 
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Update  Bk  as  above. 

Step  6  : 

Set  k  :=  k  +  1  and  go  to  step  1. 

4.  The  Standard  Assumptions 

In  this  section  we  state  the  assumptions  under  which  we  prove  global  conver¬ 
gence. 

1)  There  exists  a  convex  set  Q  e  Rn  such  that,  for  all  k,  xk  and 

Xk  +  Sk  6  . 

2)  /  and  h{  e  . 

3)  V/j(x)  has  full  column  rank  for  all  x  e  Q  . 

4)  f{x),h{x),Vh(x),Vf{x),V2f(x),{Vh{x)T\/h{x))~l  and  each 

/i,- (i )  ,  for  ,m  are  all  uniformly  bounded  in  norm  in  O  . 

5)  The  matrices  {  Bk  ,  A:  =1,2,...}  have  a  uniform  upper  bound. 

If  U  were  a  compact  set  assumption  4  would  follow  from  continuity. 

If  the  exact  Hessian  is  used,  assumption  5  is  a  strong  one,  since  in  most  cases 
it  requires  that  the  Lagrange  multiplier  estimates  be  uniformly  bounded.  How¬ 
ever,  if  an  approximation  to  the  Hessian  of  the  Lagrangian  is  used,  then  any 
update  formula  that  satisfies  the  standard  assumption  5  can  be  used.  For  exam¬ 
ple,  setting  Bk  be  a  fixed  matrix  for  all  k  is  a  valid  one.  The  question  of  how 
to  use  secant  approximations  of  the  Hessian  of  the  Lagrangian  in  order  to  produce 
a  more  efficient  algorithm  is  a  research  topic.  (See  Section  9).  Typically,  secant 
updates  can  be  shown  to  satisfy  the  standard  assumption  5  only  as  a  by-product 
of  the  convergence  analysis. 
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The  same  assumptions  as  our  standard  assumptions  were  used  by  Byrd, 
Schnabel,  and  Shultz  (1987)  and  Powell  and  Yuan  (1986-b). 


5.  The  Global  Convergence  Theory 


In  this  section  we  state  the  main  results  in  our  global  convergence  analysis  in 
order  to  understand  the  motivation  for  the  lemmas  presented  in  Sections  6  and  7. 
These  lemmas  are  necessary  to  the  proofs  of  our  main  global  convergence  results 
presented  in  Section  8. 

Section  6  is  devoted  to  presenting  all  results  that  deal  with  decrease  in  the 
merit  function.  The  behavior  of  the  penalty  parameter  is  discussed  in  Section  7. 

Theorem  5.1 


Under  the  standard  assumptions,  at  any  point  (  xk  ,  \k  )  generated  by  the  algo¬ 
rithm,  either  the  termination  condition  of  the  algorithm  will  be  met  or  an  accept¬ 
able  step  will  be  found,  i.e.  the  condition  - >  r?,  will  be  satisfied  for 

Predk+j 


some  j  . 


The  proof  of  this  theorem  is  given  in  Section  8.  Theorem  5.1  shows  that  the 
algorithm  is  well  defined  in  the  sense  that  it  always  finds  an  acceptable  step  from 
any  point  that  does  not  satisfy  the  termination  criteria.  From  this  theorem  we 
see  that  the  algorithm  can  not  loop  indefinitely  without  finding  an  acceptable 
step. 

Now,  we  state  our  main  global  convergence  result,  Theorem  5.2. 

Theorem  5.2 


Under  the  standard  assumptions,  the  algorithm  produces  iterates  {  xk  }  which 
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satisfy 


lim  inf  [  ||  1 12  +  1 1  Pk  ^ fk  1 12  ]  =  0 


The  proof  of  this  theorem  is  presented  in  Section  8.  It  is  well  known  that  a 
point  x*  e  Rn  is  a  stationary  point  if  and  only  if  h{xt)  =  0  and 
P(x*)Vf  (x*)  =  0  .  Theorem  5.2  shows  that  the  algorithm  will  successfully  ter¬ 
minate.  It  means  that  the  Celis-Dennis-Tapia  trust-region  algorithm  for  equality 
constrained  optimization  generates  at  least  a  subsequence  converging  to  a  station¬ 
ary  point  of  the  problem. 


6.  The  Decrease  in  The  Model 


All  results  in  this  section  deal  with  the  reduction  of  the  merit  function  and 
the  predicted  reduction  of  the  model. 

In  the  following  lemma  we  use  the  fact  that  the  step  sk  is  chosen  to  give  at 
least  as  much  decrease  in  the  2-norm  of  the  linearized  constraints  as  does  the  Cau¬ 
chy  step  skp. 


Lemma  6.1 


There  exist  constants  bk  and  b2  independent  of  k  such  that  at  the  k ^  itera¬ 
tion  the  predicted  decrease  in  the  merit  function  given  by  the  trial  step  satisfies 


Predk 


> 


n 

2 


II  h/c  I  1 2 


min  [  Ak 


II  hk  |  1 2 

&2 


Proof 


First  we  prove  that 


III 


1 1  hk  +  Vhk  sk  1 1 2  > 


1 1  hk  I  Is 


mm 


I  hk  ||i 


(6.1) 
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where  bl  and  b2  are  constants  independent  of  k  . 

When  the  TRQP  step  is  used,  inequality  (6.1)  is  valid  a  fortiori. 

Consider  the  case  when  either  the  QP  step  or  the  CDT  step  is  used.  From  the 
way  of  computing  the  step  sk  and  using  the  fact  that 

1 1  hk  1 1  1 1  hk  +  Vh?scp  1 12  <  1 1  hk  1 |22  ,  we  have 

1 1  hk  1 122  ~  II  hk+Vhk sk  |  |22  >  I  I  hk  |  |22  —  0k 

>  1*1  IK  Ml  -  1 1  hk+Vhfskcp  |||]. 


=  r2  [  -  2  hkT  VA*V  "  (4pf  ™k  Vh?  s?  ]  . 
Note  that  when  the  QP  step  is  used,  we  have 

II  *»  III  -  II  *»+va/j4  II!  =  IKII|>  \\ht\ti-H. 

From  the  definition  of  skp  ,  we  have  skp  =  —  ak  Vhk  hk  ,  where  ak  is  defined 

by 


ak 


if 


I  Vhk  hk  |  2  ||  VA/VA*  hk  HI 


^t'ltlll||g  >  A,  ,  (6.2-a) 

I'm  t  112  -  *  ’  v  ’ 


otherwise, 


«*  = 


li  K  III 

1 1  Vhk  VA*  hk  1 1  2 


(6.2-b) 


Vhk  hk 

Consider  the  first  case.  i.e. ,  the  case  when  skp  =  —  Ak  — — - — - — r-j—  .  In  this 

I  hk  1 12 


case,  using 


hk  1 1 2 


Vhk  VAt  hk  |  |2 


>  A*  ,  we  have 


1 1  hk  |  I2  —  ||  hk+Vhksk  |  ||  >  72  [  2  At  ||  Vhk  hk  |  |2  —  A| 


1 1  VA*r  VA*  A*  "  2 


**112 


I  I  VA*  A*  |  |; 


>  r2  [  2  ||  VA*.  hk  1 12  —  At  1 1  Vhk  hk  1 12 
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=  r2  At  ||  VA*  hk  ||s.  (6.3) 


Now,  consider  the  second  case.  We  have 


II  a.  Ill-  IK+VAft  III  >  r2  [  2  1  l2—  II  TO,  a,  III 


1 1  VA|  VA,  A,  II# 
1 1  VA,  A,  1 || 


1 1  VA,r  VA,  A,  1 12‘ 


V/i/  VA*  hk  III 


Hence, 


II  hk  \\$  ~  II  A*+VA*s*  Ill  >  r2 


II  VA,  A,  HI 
II  VA,r  VA,  A,  HI 


2  — 


>  T2 


I  VA,  A,  1 122 

VA,  VA,r  1 1. 


From  the  last  inequality  and  (6.3),  we  can  write 


1 1  hk  1 122  ~  II  ^it+^Afc  1 122  >  t2  II  VA*  A*  1 12  min  [  A* 


1 1  VA*  1 1 2 
1 1  VA*  VA*r  1 1, 


Now,  using  the  standard  assumption  3,  we  have 


va*  a*  1  u  > - UAib _ 

*  *  112  -  ||(VA**VA*riVA*r||2  ’ 

The  rest  of  the  proof  of  (6.1)  now  follows  from  the  standard  assumption  4. 

From  the  way  of  updating  the  penalty  parameter  rk  in  step  3  of  the  algorithm, 
we  have 


Pred„  >  \  |  ||  A,  Ml-  ||  A,+VA,rs,  ||||  . 

The  proof  of  the  lemma  follows  immediately  from  (6.1)  and  the  last  inequality. 


Lemma  6.1  shows  that  the  way  we  update  the  penalty  parameter  ensures 
that  the  predicted  reduction  at  each  iteration  will  be  at  least  as  much  as  a  frac¬ 
tion  of  Cauchy  decrease  in  the  merit  function.  This  indicates  compatibility  of  the 
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step  choice  algorithm  with  the  augmented  Lagrangian  merit  function. 


Lemma  0.2 


Let  sk  and  sk  be  the  tangential  and  the  normal  components  of  the  trial  step  sk 
generated  by  the  algorithm.  Then,  sk  satisfies 


(  V4  4-  Bk  s?  )Ts‘  <  -L  |  |  Pk(Vlk  4-  Bk  s£  )  I  |2  mm  [  A* 


ii  pk(vik+Bk  s n  ||2 
2\\Bk\\2 


where  A*  =  ~  II  III  • 


Proof 

The  proof  follows  directly  from  Lemma  3.2  in  Powell  and  Yuan  (1986-b). 

Lemma  6.3 

For  any  xk  ,  xk+sk  e  Cl ,  we  have 

|  Aredk  -  Predk  |  <  ak  1 1  ak  \  |22  +  rk  {  a2  |  |  sk  \  |23  +  a3  1 1  hk  \  |2  1 1  sk  \  ||  ]  , 
where  at  ,  a2  ,  a3  are  constants  independent  of  k  . 

Proof 


From  (2.1),  (2.2)  and  the  Cauchy-Schwarz  inequality,  we  can  write: 

|  Aredk  -  Predk  |  <  |  l{xk,\k)  +  Vxl{xk,\k)T sk  +  y  skBksk  -  l{xk+sk  ,  \k)  \ 
+  I  [  hk  +  V hksk  —  h(xk+sk)  ]  | 

+  rk  I  I  I  hk  4-  Vhk  sk  1 1|  —  ||  h{xk+sk )  III  |  . 


|  Aredk  -  Predk  |  <  y  | sjf  [  Bk  -  V;l{xk+^sk,\k)  ]  sk  \ 


Hence, 
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+  Y  I  Sk  [  }  h  I 

+  rk  I  «*]  Vhk  Vhjf  —  Vh{xk+^3sk)VhT  (xk+^sk)  ]  sk  \ 

+  rk  I  *k  hixk+Uh)  h  |  , 

for  some  ^  ,  £2  ,  f3  e  (0,1). 

By  using  the  standard  assumptions  2,  4  and  5,  the  form  of  AX*  ,  and  the  fact 
that  1 1  sk  1 12  <  A*  ,  we  get 

|  Aredk  -  Predk  1  <  ak  1 1  sk  \  |f  +  a2  rk  \  \  sk  1 1|  +  a3  rk  \  |  sk  \  |22  1 1  hk  \  |2 
which  is  the  desired  result.  ■ 

The  result  we  obtained  in  the  last  lemma  does  not  depend  on  any  property  of 
the  matrices  {  Bk  }  except  that  their  norms  have  a  uniform  upper  bound,  and  does 
not  depend  on  any  property  of  the  steps  except  that  they  lie  inside  fi  . 

Corollary  6.4 

For  any  xk  ,  xk+sk  e  Q  ,  we  have 

|  Aredk  -  Predk  |  <  a4  rk  \  \  sk  \  |f 
where  a4  is  a  constant  independent  of  k  . 

Proof 

The  proof  follows  immediately  from  the  last  lemma,  the  fact  that  rk  >  1  ,  the 
fact  that  II  *  I ls  <  A*  ,  and  the  standard  assumption  4.  ■ 

Corollary  6.4  shows  that,  if  the  penalty  parameter  is  bounded,  our  definition 
of  predicted  reduction  implicitly  gives  an  approximation  to  the  merit  function 
that  is  accurate  to  within  the  square  of  the  steplength. 
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Lemma  6.5 


If  sk  is  a  trial  step  generated  by  the  algorithm  and  sk  is  its  tangential  com¬ 
ponent,  then 

(  V/t  +  Bk  sk  )T sk  <  0  . 


Proof 


If  is  the  step  obtained  from  the  CDT  subproblem,  then 

—  (V/*  +  Bk  sk)  =  n  sk  +  <yVhk  (  hk  +  Vhk. sk  ) 
where  n  ,  a  >0  .  See  Celis,  Dennis,  and  Tapia  (1985).  Now 

~P  k  (V4  +  Bk  h)  =  V  Pk  h  +  a  Pk  [  ^hk  (hk+Vhk  sk)  ]  =  sk  . 

Hence 


-(V4  +Bksk)TH  =  P  IU  ill  >  0. 

Now,  assume  that  the  step  is  generated  from  the  TRQP  subproblem.  Then  4 
must  satisfy 

(  Bk  +  /i  I  )T  4  =  —  (V/*  -l-  Vhk  AX*) 


where  n  >  0  with  //  =  0  if  the  step  is  generated  from  the  QP  subproblem,  i.e.  if 
the  trust  region  constraint  is  not  binding.  By  multiplying  by  Pk  ,  we  obtain 

^4(^4  +  Bk  4)  =  —Pk^hk  AX*  —  V  Pk  4  =  ~  P  H  ■ 

Hence, 


(  ^4  +  R*  4  )r^t  —  —  P  1 1  $k  I  1 2  <  0 


This  implies  that  in  all  cases  the  lemma  is  true.  ■ 


Lemma  6.6 


There  exists  a  constant  bz  such  that,  for  all  k  ,  the  normal  component  sk 
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satisfies 

ii  «;iu  <  ‘3 ii  a*  ii2 

Proof 

The  proof  follows  directly  from  the  standard  assumptions  3  and  4.  ■ 

Lemma  6.7 

Let  sk  be  a  step  generated  by  the  algorithm  and  let  sk  and  sk  be  its  tangential 
and  normal  components  respectively.  Let  Pk  be  defined  by  (1.3),  At  be  as  in 
Lemma  6.2  and  hk  =  Vhk(VhkVhk)~1hk  ,  then 

D  J  \  1  ll  Dm  ,  D  'Mil  •  r  —  1 1  k(^lk  +  Bksk  )  |  |2  , 

Predk  >  —  1 1  Pk(Vlk  +  Bksk  )  1 12  min  [  A*  ,  - — - ] 

—  ^5  I  I  h  1 12  I  I  hk  1 1 2  —  |  (V/*  +  Bksk)T hk  | 

+  rk  {  I  I  hk  I  II  ~  1 1  hk  +  Vhkr, sk  |  |22  ]  ,  (6.4) 

where  64  and  65  are  constants  independent  of  k  . 

Proof 

From  the  definition  of  Predk  and  AX*  ,  we  can  write 

Predk  =  —  (V/t  +  Bksk)T sk  +  y  sk  Bk  sk 

+  (V4  +  Bksk)TVhk  (VhkTVhk)-\  hk  +  Vhk  sk  ) 

+  rk  [  1 1  hk  |  II  —  I  I  hk  +  'Vhksk  1 1 2'  ]  • 

Now,  since  Vhk  (  Vhk  Vhk  )_1  Vhk  sk  =  sk  ,  we  can  write: 

Predk  =  —  (Vlk  +  Bksk  )T sk  +  ^  sk  Bk  sk 


+  (  V/t  +  Bksk  )T\hk  +  sk  ] 
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+  rk[  ||  hk  HI-  ||  hk+Vhk%  Ml], 

Using  sk  -  =  St‘  and  -  (V/*  +  >  -  1  (V/t  +  Bksk)Tsjc  ,  we  get 

Predk  >  -  1  (V/*  +  £M*)Ts*  +  Y  $?Bksk  +  (V/*  +  Bksk)Thk 
+  rk  [  ||  A*  III  -  ||  hk+VhkTsk  |||] 


^  -y  (v4  +  Bk  )r«i  -  {sk)TBksl  +  y 
+  (V/*  +  Bksk)T hk  +  rk[  II  hk  |||-  I  I  hk  +  Vhk  sk  1 1|  ]  ; 

which  can  be  written  as 

Predt  >  -1  (  V(,  +  B„  »!  )TH+\  }  »tTBtH 

+  (VI*  +  Bksk)rhk  +  rk  [  II  A*  II!  -  1 1  hk  +  Vhjfsk  1 1|  ]  . 


By  using  Lemma  6.2,  we  obtain 

e™<<*  >  J  1 1  P,(V4  +  B*l,"  )  1 1,  min  [  A,  ,  -1L ±  >  1  lz 

^  “  II  *4  II 2 

+  |  |  sjBksnk  +  (V/t  +  S*4)rA* 


+  rk  [  I  I  hk  1 1!  —  1 1  hk  +  V A/s*  III]. 


But  by  Lemma  6.6  and  the  standard  assumption  5,  we  can  write 


Pnik  >  J  1 1  P,(V 4  +  B,St"  )  1 12  min  |  A,  ,  JJ.  +  M  ’  '  '* 


( ^4^3  I  I  4  I  1 2  II  1 1 2)  I  (v4  +  Bk  sk)T  hk  I 


+  rk  [  I  I  At  1 12  —  1 1  A*  +  Vhksk  |  I!  ]. 


where  &4  is  a  constant  such  that  1 1  Bk  |  |2  <  &4  .  Now,  by  setting  bh  =  &4  &3  , 


we  obtain  the  result. 
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In  order  to  prove  that  the  algorithm  is  making  an  improvement  in  the  merit 
function,  we  have  to  prove  that  we  will  get  a  positive  predicted  reduction  at  each 
iteration.  Toward  this  end  we  must  prove  that  the  positive  quantities  in  (6.4)  are 
greater  than  or  equal  to  the  absolute  value  of  the  negative  quantities.  If  this  is 
not  the  case,  then  the  algorithm,  according  to  Scheme  2.3,  will  increase  the 
penalty  parameter  to  ensure  that  this  will  be  the  case.  First  we  need  to  derive  an 
upper  bound  on  the  third  quantity.  The  following  lemma  will  give  us  this  bound. 

Lemma  6.8 

Let  hk  be  as  in  Lemma  6.7,  then  there  exist  constants  a5  and  a6  such  that 
I  (^4  +  Bkh)T  hk  |  <  [  a5  |  |  Sk  I  |2  +  a6  |  |  sk_ti  |  |2  ]  II  1 1 2 
where  sk_tk  is  the  last  acceptable  step. 

Proof 

We  have 

Qk(^ 4  +  Bk  sk)  =  Qk  V/ k  +  Qk  \7hk  \k  +  Qk  Bk  sk 

Now 

QkVfk  =  Vhk  (  Vhf  Vhk  r1  VhkTVfk  =  -  Vhk 
where  \j-  =  —  (  Vhk  Vhk  )-1  Vhk  Vfk  ,  and 

Qk  ^hk  =  ^hk  (  Vhk  Vhk  )  1  Vhk  Vhk  =  \7hk  . 

Since  sk_tk  is  the  last  acceptable  step,  then  xk_ti+1  =  xk  and  \k_tk+1  —  \k  .  We 
have 

Qk  Vhk  \k  =  Vhk  \k  =  Vhk  \k_tk+l 


—  —  Vhk  [  (  Vhk_tk  'Vhk_ti  )  lVhk_ti  (  +  Bk_tk  sk_tk  )  J 
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—  Vhk  [  X/_<t  —  (  Vhk_tt  )  xVhk_ti  Bk_tk  sk_tk  ]  . 

This  implies  that 

1 1  g*(V/t  +Bt  **)  II2  <  II  VA*  (Xjf  -  X/_(J  1 12 

+  by  ||  VAt  1 12  II  Bk_tk  I  |2  II  Sk_ik  1 12 

+  II  Bk  1 12  II  st  ||2J  (6.5) 

where  by  is  as  in  Lemma  6.1.  Now  by  using  the  standard  assumptions,  there 
exists  a  constant  bs  ,  such  that 

1 1  va*  (X/  -  Xf_4|)  1 12  <  II  VA*  ||2  II  X|-X|_(t  ||2 

<  &6  II  xk  -  xk_h  1 12  , 

and  since  xk  =  xk_tk+l  ,  we  have 

1 1  VAt  (X£  —  Xjf_tt)  1 1 2  <  b6  ||  xk_tk+l  —  xk_tk  1 1 2 

-  1 1  sk-tt  1 1 2  •  (6.6) 

Substitute  (6.6)  in  (6.5),  and  by  using  the  standard  assumptions  4  and  5,  we 
obtain 

1 1  Qk(^ 4  +  Bk  sk)  1 1 2  <  b7  1 1  sk  1 1 2  +  b8  ||  sk_it  1 1 2  •  (6.7) 

where  b7  and  b8  are  constants  independent  of  k  . 

Since  Qkhk  =  hk  ,  we  have 

I  (V4  +  Bksk)T  hk  |  =  |  [  Qk  (Vlk  +  £44  )  }T  hk  | 

<  I  I  Qk  (^4  +  Bksk)  1 12  |  |  A*  1 1 2  . 

Now,  by  using  (6.7)  and  the  fact  that  1 1  A*  II 2  <  ^9  ||  hk  ||2,  where 

69  =  sup  1 1  VA(ar)(VA(a;)r  VA^))-1  ||2  the  proof  follows  immediately.  ■ 

x  t  n 
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The  following  lemma  proves  the  important  property  that  if  1 1  hk  1 |2  is 
small  enough,  then  the  penalty  parameter  will  not  be  increased  in  step  3  of  the 
algorithm. 

Lemma  6.9 


Let  k  index  an  iteration  at  which  the  algorithm  does  not  terminate.  If 
II  hk  1 1 2  <  ci  At  where  ck  is  a  small  constant  that  satisfies 
V3  e  e  e 


ck  <  min 


2  b 3  3  A*  3  65  A*  48  (05  4“  b 5  4“  ttg)  A* 


min  (  1  , 


3  b4  A* 


)  ]  (6-8) 


where  a5  and  a6  are  as  in  Lemma  6.8,  b3  is  as  in  Lemma  6.6,  b4  and  6S  are 
as  in  Lemma  6.7,  and  A,  is  the  upper  bound  on  the  trust  region  radius,  then 


Predk  >  ~  {  1 1  hk  |  |22  -  |  |  hk  +  Vhjfsk  \  |22 


+  11^ *(V/*  4-  Bk  sk  )  |  |2  min  [  yAt  , 


Pk(yik  +  Bk  sk  )  1 12 
2b, 


Proof 


If  k  is  the  index  of  an  iteration  at  which  the  algorithm  does  not  terminate,  then 

1 1  Pk  II2  +  II  1 1 2  ^  €  ■ 

Now 

1 1  Pk(Vlk  +Bk  s£)  ||2  >  ||  Pk  V/t  1 12  -  1 1  PkBk  s£  1 12 

=  I  In  ||2  -  65  IK  II2  (6.9) 

But,  since  |  |  hk  |  |2  <  —  e  ,  it  follows  that  1 1  Pk  Vlk  |  |2  >  —  e  .  We  have 

3  3 

1 1  Pk(Vlk  +  Bk  s?  )  1 12  >  £  -  65  A*  >  1  €  .  (6.10) 

Now,  from  Lemma  6.7,  Lemma  6.8  and  1 1  hk  1 12  <  cx  A*  ,  we  obtain 
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Prei t  >  j  1 1  Pi  (VI,  +  B,l,”  )  1 12  mi"  I  A, 


I  +Btil)  II; 
2  64 


—  C1  [  65  I  I  h  I  (2  +  (  a5  I  I  h  I  I2  +  a6  II  sk-tk  l  1 2  )  ] 


+  rk  {  1 1  fyt  1 \$  —  II  hk  +  1 122  ]  • 


(6.11) 


So,  by  using  (6.10),  we  can  write 


P  red  if  >  j  1 1  P*(VZ*  +  Bk  sk  )  |  |2  min  [  Afc  , 


-T-  I  I  Pki^^k  +  Bk  K  )  |  1 2 


26, 


+  4“  (  )  m‘n  t  ’  c  /  1  —  C1  [  (a5  +  ^5  +  a6)  A*  ]  Afc 


8  '  3 


6  6, 


+  rk  [  II  A*  III-  IK+VA*ri*|||]. 


(6.12) 


Now,  since  A,  =  v  A?-  II  III,  and  by  using  Lemma  6.6  and 

\A7 

1 1  K  1 12  <  — —  A*  ,  we  obtain 

L  0  3 

A,  >  V  A?  -  (3/4)  A,2  -  |  A, 

By  substituting  the  last  inequality  in  (6.12),  we  obtain 

Predk  >  1  1 1  Pk(Vlk  +  Bk  s£  )  1 12  min  [  |  At  ,  ■  1 1  P1(V/ L±3-L,}  1  ,2-  ] 

+  -g-  (  )  min  Ij4,  ~  ci  [  (a5  +  ^5  +  a6)  A*  ]  Afc 

+  rk  [  1 1  hk  |  1 2  —  |  |  hk  +  Vhf sk  ||2j. 


Since 


cj  satisfies  inequality  (6.8),  we  have 
Predt  >  i  1 1  +  B„  i,“  )  1 12  min  (  ~A  , 

+  £[  II*.  Ill-  ll*»+V*,%  ||?|. 


1 1  n(V4  +_^j  )  1 1 2 

264 


(6.13) 


This  is  the  desired  result. 
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If  1 1  hk  ||2<cx  At,  then  half  of  the  first  term  in  (6.11)  will  cancel  the 
second  and  the  third  terms,  and  the  fourth  term  need  never  enter  into  the  calcula¬ 
tion.  This  implies  that  if  we  set  rk  =  ,  inequality  (6.13)  remains  valid.  So, 

in  this  case,  the  algorithm  will  not  increase  the  penalty  parameter. 

Lemma  6.10 

Let  k  be  the  index  of  an  iteration  at  which  the  algorithm  does  not  terminate.  If 
!  U*  I  U  <  A*  ,  where  cx  is  as  in  Lemma  6.9,  then  there  exists  a  constant  c2 
such  that 

Predk  >  c2  At 

Proof 


From  (6.10)  and  (6.13),  we  have 


Predk  > 

> 

The  result  now  follows  if  we  set 


1  r  ,  € 

- e mm  1  ,  - 

48  3  64  A, 


Co  =  - €  min  1 

2  48  1 


e 

6~T7 


]  A*  - 


3  64  A, 


7.  The  Behavior  of  The  Penalty  Parameter 

This  section  is  devoted  to  the  study  of  the  behavior  of  the  penalty  parameter. 
Our  objective  is  to  prove  that  the  penalty  parameter  is  bounded.  This  will  imply 
that  rk  is  fixed  for  k  sufficiently  large.  This  result  is  very  important  in  proving 
global  convergence  of  the  algorithm. 


Lemma  7.1 
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If  k  is  the  index  of  an  iteration  at  which  the  penalty  parameter  rk  is  increased, 
then 

rk  min  [  At  ,  —  *  —  ]  <  a7  1 1  sk  |  |2  +  a8  II  sk-tt  1 1 2  > 

®2 

where  a7  and  as  are  constants  independent  of  k  and  sk_tk  is  the  last  accept¬ 
able  step. 

Proof 


Let  k  be  the  index  of  an  iteration  at  which  the  penalty  parameter  is  increased, 
then  by  step  3  of  the  algorithm  rk  is  updated  by  the  following  rule: 

_  _  'SI Bk  h  +  +  ^hksk) 

r‘  “  II  A*  III-  !i  III 

This  can  be  written  as 

£[  IUJII-  1 1  +  V/./  it  1 122  |  =  (  Vlt  +  B„  it  f  )t  -  I  1/  B„  h 

-  (  V/,  +  B„  st  )T(h„+  if  )  (7.1) 

+  y  !  1 1  hk  |  |22  —  ||**H-  V/i*7" s*  1 1|  ]  • 


Thus,  from  (7.1),  (6.1),  and  Lemma  6.5 


rk  II  **  I  1 2 


min 


II  A*  II: 


} 


<  ~j  (sl)TBk  |  skBk  s? 
—  (  ^4  +  Bk  sk  )T  hk 


-  p  h£  Vhk  sk  , 


and  we  can  write 

lift*  I  U 

2  bk 


II  hk  |  1 2 
i>2 


<  llfl*ll2  IUII2  II  4”  II 


min  [  Ak  , 


2 
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+  P  II  ^jfc  I  1 2  I  I  h  I  1 2  I  I  fyfc  I  1 2 
+  I  (  V4  +  Bk  sk  )T  hk  I  .  (7.2) 


Now  by  using  Lemma  6.8, 


a 

2 


II  A*  II* 

b  i 


min  [  Ak 


II  hk  ||, 

*2 


5;  I  I  I  1 2  I  I  h  I  1 2  II  $k  I  1 2 


+  (a5  I  I  I  I2  +  a6  I  I  sk-tk  I  I2)  I  I  hk  |  |2 


+  P  I  I  ^bk  |  1 2  I  I  sk  |  1 2  |  |  hk  |  1 2  • 


But,  by  Lemma  6.6  1 1  sk  |  |  <  63  1 1  hk  1 1  and  from  the  standard  assumption  4 
1 1  1 12  <  bio  where  610  =  sup  1 1  Vh(x)  1 1  , 

X  tU 


r*  II  A*  II2 
2  bt 


min  [  , 


IIA,  II, 


<  (  64  63  +  a5  +  p  610  )  |  |  |  |2  II  I  1 2 

+  a6  I  I  sk-tt  I  1 2  1 1  hk  |  1 2  • 


The  result  follows  immediately  upon  dividing  by 


lk  I  I  2 


2  6, 


Corollary  7.2 


If  k  is  the  index  of  an  iteration  at  which  the  algorithm  does  not  terminate  and 
the  penalty  parameter  rk  is  increased,  then 

Afe  <  a9  I  I  sk  1 1 2  +  aio  II  sk-lk  1 1 2 

where  a9  and  a10  are  constants  independent  of  k  and  sk_tk  is  the  last  accept¬ 
able  step. 

Proof 


From  Lemma  7.1,  if  k  is  index  of  an  iteration  at  which  the  penalty  parameter 
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rk  increases,  then  rk  must  satisfy  the  following  inequality: 

rk  min  [  A*  ,  ^  ^  ]  <  a7  1 1  sk  1 12  +  a8  |  |  sk_tt  \  |2 

°  2 

From  Lemma  6.9  if  1 1  hk  1 |2  <  cx  A*  ,  then  we  do  not  increase  rk  .  So,  for  any 
iteration  at  which  the  penalty  parameter  increases,  we  must  have 

1 1  hk  1 1 2  >  c  l  At  - 

and  we  obtain 


rk  A*  min  [  1  ,  ■—  }  <  a7  |  i  sk  |  |2  +  a8  II  sk-tk  1 1 


2  • 


Hence, 


rk  At  <  a9  II  h  ||2  +  A  jo  II  Sk_h  ||2 


and  we  get  the  desired  result. 


Under  the  standard  assumptions,  at  each  iteration  at  which  the  termination 
criteria  is  not  satisfied  and  the  penalty  parameter  is  increased,  rk  Ak  is  bounded. 


However,  if  we  can  bound 


lU-JIs 


by  a  constant  independent  of  k  ,  we 


obtain  an  upper  bound  on  rk  itself.  In  the  following  lemma  we  derive  a  relation 
between  |  |  sk_tk  \  |2  and  Ak  .  In  Lemma  7.4  we  prove  that  the  penalty  parame¬ 
ter  is  bounded. 


Lemma  7.3 

Let  k  be  the  index  of  any  iteration  at  which  the  algorithm  does  not  terminate 
and  the  penalty  parameter  rk  is  increased,  then 

At  >  c3  ||  sk_tt  1 12 

where  sk_tk  is  the  last  acceptable  step  and  c3  is  a  constant  independent  of  k 
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and  4  . 

Proof 


We  consider  three  cases: 

First,  if  4=1,  i.e. ,  sk_ x  is  the  last  acceptable  step,  then  from  (2.3),  we  have 

At  ^  ai  II  s*-i  1 12  • 

The  result  in  this  case  follows  if  we  set  c3  =  £4  . 


Second,  if  is  not  the  last  acceptable  step  and  at  the  same  time 

1 1  hk-i  1 12  >  ci  At-.  for  aH  *  «  [  1  ,  h  — 1  ]  ,  in  this  case,  from  Corollary  6.4,  we 
have 


|  Are<4_,  -  Predk_i  |  <  a4  rt_,-  ||  sk_{  ||| 

Now,  from  Lemma  6.1,  we  have 

D  J  s  rk-i  I  I  hk-i  I  I2  r  A  II  hk-i  I  I2  1 
Predk-i  >  — - 7 - min  [  A*_,-  ,  - 


2 


b, 


But  since  all  k—i  ,  i=l,...,tk—  1  satisfy  ||  hk_{  |  |2  >  cj  Ai_i  >cx  ||  sk_i  ||2, 
we  have 


Predk_i  > 


rk-i 


I  I  hk-j  |  12 

b  1  b2 


h-i  1 12  min  [  62  ,  cj 


Hence, 


Aredk-(  -  Predk_^  2  o4  bk  b2  |  |  [|2 


Predk-i  '  min  [  b2  ,  c4  ]  |  |  /**_,-  |  |2 

But  since  all  k—i  ,  i=l,...,tk—l  index  unacceptable  steps,  we  have 


/  ,  \  1  Aredk_i 

(  1  -  Vi  )  <  I  -5—7 - 1  I  , 

Predk_i 


1  <  »'  <  4  -1 


So,  for  all  *  e  [  1 , 4—  1  ],  we  have 
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1 1  4-i  1 12  >  [  1  ,  ^  min  [  62  ,  cx  ]  ||  hk_i  |  |2 

2  a4  f>2 

Now,  since  **_!  =  ar*_(,t_i)  ,  hk_k  =  hk_{tt_ 1}  ,  we  have 

At  ai  II  4-i  1 1 2 


0*1  (  1  1/j  )  .  f  t  lilt  II 

>  — r - 7—7 —  min  [  62  ,  Cl  j  1 1  hk_k  |  |2 

2  a4  Oj  o2 


M  1  ~  ) 

2  a4  6 x  b2 


min  [62,0!]  ||  4: -(4-1)  1 12 


O'!  Cl(  1  -  Vl  )  .  r  t  ,  A 

-  1  ‘2  '  Cl  1 


^  «12  Cl  (  1  -  Vi  )  .  r  , 

>  - 7T- -r--. -  mm  !  b2  ,  Ci 

2  a4  0 ^  b2 


The  result  in  this  case  follows  by  setting 

ai  c\  (  1  ~  Vi  ) 


I  I  sk-ti  I  1 2 


C3  = 


2  a4  b2 


min  [  62  ,  c!  ]  ■ 


Finally,  if  the  step  indexed  by  k—  1  is  not  the  last  acceptable  step  and  not  all 
*  €  [  1  >  4—1  I  satisfy  ||  hk_i  |  |2  >  ct  Ak_i  ,  then  there  exists  at  least  one 
J  e  [  1  >  4~ 1  ]  such  that  1 1  hk_j  1 1 2  <  cx  Ak_j  .  Let  l  be  the  smallest  integer 
e  [  1  >  h~  1  ]  such  that  |  |  hk_t  |  |2  <  cl  Ak_t  .  For  all  i  e  [  1  ,  /—I  ]  ,  we  have 

I  I  bk-i  I  1 2  >  C1  A-t 

As  in  the  first  two  parts,  if  we  set 

ai  ci  (  1  -  Vi  ) 


c4  =  mm  [  ax  , 


2  (Z4  b j  b2 


min  (  62  ,  ci  )  ]  , 


(7.3) 


we  obtain 


At  c 4  II  4-i  1 1; 


(7.4) 
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where  c4  is  given  by  (7.3).  Now,  for  k—l  we  have 

1 1  K-i  1 12  <  ci  \-i  ■  (7-5) 

From  Lemma  6.3  (where  we  replace  k  by  k—l  ),  the  inequality 
1 1  h-i  1 1 2  <  and  the  inequality  (7.5),  we  have 

I Aredk_i-Predk_t  \  <  ax  |  |  sk_t  1 |2  +  rk_t  (  a2  +  a3  cx  )  |  |  sk_,  |  |2  At_,  .  (7.6) 

If  k  indexes  an  iteration  at  which  rk  is  increased,  then  from  Corollary  7.2  and 
the  standard  assumptions  we  know  that  rk  Ak  is  bounded.  By  using  inequality 
(7.4),  we  arrive  at 

rt-i  I  I  sk-i  I  I2  <  —  rk-l  <  —  rk  Ak  <  m0  , 
c4  c4 

where  m0  is  a  uniform  bound.  Hence  inequality  (7.6)  can  be  written  as 

|  Aredk_i  -  Predk_t  \  <  ak  1 1  sk_t  \  ||  +  (  a2  +  cx  a3  )  m0  1 1  sk_t  \  |2  Ak_t 


<  [  <*1  +  (  02  +  C1  a3  )  m0  i  II  h-l  I  I2  \-l  ■ 


By  using  Lemma  6.10,  we  have 


.  Aredk_t  -  Predk_t  a4  +  (  o2  +  ck  a3  )  m0 

I  - 7T7-j -  I  <  -  I  I  sk-l  I  I2  • 


Predk_t 
Ik 


c2 


But  since  the  k—l  “•  is  not  an  acceptable  step,  then 


<  I  I  <  +  +  MV,  II. 


Pred 


k—l 


Hence,  by  using  inequality  (7.4),  we  obtain 


>  c4  1 1  sk_ 


1  II2 


> 


_ c2  c4 _ 

[ai  +  (a2  +  q«3)mol 


(  1  -  Vi  )  • 


> 


C2  Ci  (  1  -  Vl  ) 


[  Oi  +  (  a2  +  Cj  a3  )  m0  J  A* 


I  I  sk-tt  I  1 2 
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The  result  then  follows  if  we  set 


Co  =  min 


c2  c4  (  1  -  m ) 


'4.  > 


al  "M  a2  +  C1  a3  )  m0  ]  A, 


This  completes  the  proof. 


The  following  lemma  uses  Corollary  7.2  and  Lemma  7.3  to  prove  that  if  each 
member  of  the  sequence  of  iterates  generated  by  the  algorithm  does  not  satisfy  the 
termination  condition  in  step  1  of  the  algorithm,  then  the  penalty  parameter  is 
bounded. 

Lemma  7.4 

Under  the  standard  assumptions,  if  each  member  of  the  sequence  of  iterates  gen¬ 
erated  by  the  algorithm  does  not  satisfy  the  termination  condition  (3.1),  then  the 
penalty  parameter  sequence  {  rk  }  is  bounded. 

Proof 

The  proof  is  by  contradiction.  Suppose  that  {  rk  }  is  not  bounded.  This  implies 
that  there  exists  an  infinite  subsequence  of  indices  {kj}  such  that  {rk }  is  increased. 
Now,  from  Lemma  6.9,  we  never  increase  the  penalty  parameter  if 
1 1  h  |  |2  <  ci  Ak  .  So,  ||  hkj  1 12  >  Cl  Ak]  . 

Let  m  be  any  integer  e  {  kj  },  then  from  Corollary  7.2  we  can  write 

Ln  <  ag  I  Um  II2  +  « 10  I  I  I  I2  ,  (7.7) 

where  sm-tm  1S  the  last  acceptable  step.  On  the  other  hand,  from  Lemma  7.3  we 
have 

I  I  sm-tm  I  1 2  ^  -  Am  • 

c3 

By  substituting  the  last  inequality  in  (7.7),  we  get 
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/  ,  10 

rm  S  a9  +  -  • 

c3 

Since  ag  +  — is  independent  of  m  ,  it  is  an  upper  bound  of  the  sequence 
c3 

{  rk  }  contradicting  the  assumption  that  the  sequence  {  }  is  increased.  This 

proves  the  theorem.  ■ 

From  the  last  lemma,  we  can  conclude  that  for  all  k  ,  1  <  rk  <  r*  where 
r,  is  a  constant  independent  of  k  . 

Since  if  rk  is  increased,  it  is  increased  by  a  quantity  >  p  ,  then  the  number 
of  iterations  at  which  the  penalty  parameter  is  increased  must  be  finite.  Hence, 
there  exists  a  constant  k  such  that 

rk  =  rp  for  all  k  >  k  .  (7-8) 

8.  The  Global  Convergence  Theory 

In  this  section  we  present  the  proofs  of  our  main  global  convergence  results 
that  have  been  stated  in  Section  5.  We  start  by  restating  and  then  proving 
Theorem  5.1.  First  we  introduce  some  notation  that  will  be  used  in  the  remainder 
of  this  paper. 

We  call  an  iteration  a  successful  iteration  if  the  trial  step  of  that  iteration 
Aredk 

was  accepted  because  - >  m  .  Otherwise,  the  iteration  is  said  to  be  unsuc- 

Preak 

cessful. 

We  denote  by  S(  kx  ,  k2)  the  set  of  indices  of  successful  iterations  in  the  interval 
[  fcl  ,  &2  ]  . 


Theorem  5.1 
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Under  the  standard  assumptions,  at  any  point  (  xk  ,  \k  )  generated  by  the  algo¬ 
rithm,  either  the  termination  condition  of  the  algorithm  will  be  met  or  an  accept¬ 
able  step  will  be  found,  i.e.  the  condition  Aredk+)  >  -  wiU  be  satisfied  for 

Predk+j 

some  j  . 

Proof 


If  the  termination  condition  of  the  algorithm  is  satisfied,  then  there  is  nothing  to 
prove.  Assume  that  the  point  (  xk  ,  \k  )  does  not  satisfy  the  termination  condi¬ 
tion  in  step  1  of  the  algorithm. 

First,  we  assume  that  1 1  hk  |  |2  >  ct  Ak  where  Cl  is  as  in  Lemma  6.9.  Using 
Lemma  6.1  we  obtain 


p  ^  N  rk  I  I  hk  I  1 2  r  , 

Prei‘  -  ~2 - bJi - ni'ii  [  &2  i  cl  I  i 


then,  using  Corollary  6.4,  we  obtain 


Aredk 

Predi. 


1  I  < 


2  cl  4  6  j  b  2 


I  I  2  min  [  ,  Ci 


Now,  as  Ak  gets  smaller,  the  quantity 


Aredk 

Predi. 


—  1  |  approaches  0  and  hence 


the  condition 


Aredk 

Predi. 


>  rjl  will  be  met  after  a  finite  number  of  trials. 


Now,  assume  that  1 1  hk  ||2  <  cx  Ak  .  Using  Corollary  6.4,  Lemma  6.10  and 
rk  <  r*  ,  we  can  write 


Aredk  -  Predk  ,  ^  a4  r,  a 

P^l  1  S  “77“  *  ' 


So,  as  Ak  gets  smaller,  the  quantity  |  — — 7 —  i  |  approaches  0,  and  hence 


Predi 


the  condition 


Aredk 

Predi 


>  ri1  will  be  met  after  a  finite  number  of  trials.  This 
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completes  the  proof.  ■ 


The  proof  of  our  main  global  convergence  result,  Theorem  5.2,  uses  the  fol¬ 
lowing  two  lemmas.  The  first  lemma  proves  that  under  the  standard  assump¬ 
tions,  either  the  algorithm  terminates,  or  converges  to  a  feasible  point.  The 
second  lemma  proves  that  under  the  standard  assumptions,  either  the  algorithm 
terminates,  or  li  Pk  V/t  ||2  <  6  ,  for  some  k  sufficiently  large,  where  e  >  0  is 
any  given  constant.  This  means  that  if  each  member  of  the  sequence  of  iterates 
generated  by  the  algorithm  does  not  satisfy  the  termination  condition  (3.1),  then 
the  sequence  {  1 1  Pk  V/t  1 12  }  will  not  be  bounded  away  from  zero. 

Lemma  8.1 


Let  the  standard  assumptions  hold.  If  each  member  of  the  sequence  of  iterates 
generated  by  the  algorithm  does  not  satisfy  the  termination  condition  (3.1),  then 

lira  1 1  hk  1 12  =  0 

K~+  00 

Proof 


Suppose  lim  sup  1 1  hk  |  |2  =  e0  >  0  .  Then  there  exists  an  infinite  sequence  of 
*—►00 

indices  {  kj  }  such  that  1 1  hk  |  |2  >  y-  for  all  k  e  {  kj  }. 

Let  k  be  such  that  k  e  {  kj  },  k  >  k  ,  where  k  is  the  same  as  in  (7.8).  Since 
h  €  C2  ,  we  have  that  for  some  /9  >  0  and  any  x  e  f2 


1 1  h(x)  1 12  >  1 1  hi  I  U  -  1 1  M*)  -  hk  1 12  >  1 1  hi  1 12  ~  P  1 1  *  ~  2*  1 12  • 

‘  ‘  hr  " 


This  implies  that  for  all  x  that  satisfies  1 1  x  —  xj.  \  |2  < 


lk  1 12 


2/9 


,  we  have 


II  h 


i  112 


I  I  h{x)  1 12  > 


2 
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Let  <7  = 


hk  1 12 


2  /? 


and  consider  the  ball 


Ba  =  [  x  :  ||  x -xt  1 12  <  ° 


First  we  will  show  that  eventually  the  iterate  must  move  outside  Ba 
If  xk  e  Ba  for  all  k  >  k  ,  then  from  Lemma  6.1  and  rk  >  1  , 

D  j  ^  1  1 1  hk  1 1 2  fA  IUt  II2  , 

Predk  >  — - - - min  [  Ak  ,  - - - ] 

lb  1  0  2 


II  ht  n2 


^  1  M  nk  112  .  r  .  II  ,lk  112 

>  - - : - mm  Al 


II  ht  |h 


2  b j 


> 


2  6, 


If  all  k  >  k  are  not  acceptable  steps,  then  we  contradict  Theorem  5.1.  Hence, 
there  exists  an  infinite  sequence  of  indices  indexing  successful  steps  inside  the  ball. 
For  any  such  k  we  have 

~  At+i  =  Aredk  >  Predk 


.  Vi  1 1  hk  | 1 2  1 1  hk  1 1 2 

> - ; - mm  A*  ,  - 

“2  2  1  *  ’  2  b2 


(8.1) 


Since  is  bounded  below  and  1 1  hk  1 12  >  0  ,  inequality  (8.1)  implies  that 


lim  inf  Ak  =  0 

k— >oo 


(8.2) 


Define  <7,  to  be  a  constant  that  satisfies: 


v1  <  min  [  1  , 


a  b  Ak 


lk  I  12 


a  j  r*  (  1  —  T}2  )  2 

where  a  =  max  [  r«  ,  2  r *  aA  ]  and  6  =  max  j  ,  62  ]  .  Now,  because  of  (8.2), 
there  exist  some  sufficiently  large  k  such  that 


<7,  r* 

At  <  -^V-(i  -%)• 


a  b 


(8.3) 


Let  m  be  the  first  integer  greater  than  1  such  that  (8.3)  holds.  This  implies 
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that  m  >  k  + 1  ,  and  using  (2.3)  we  get 

b 


b  I  I  ^rn-i  I  1 2  < 


o i  r* 

<  - (  1  -  %  ) 

a 


(8.4) 


<  <Ti  (  1  -  %  )  <  CT1 


(8.5) 


Now,  by  using  Lemma  6.1  and  the  fact  that  rm_x  >  1  ,  we  obtain 


n  j  v.  1  II  i  |  1 2 
Frerfm_1  >  —  - — — -  mm 


sm-l  M2 


I  I  bm  —j  |  I2 


(8.6) 


and  since  m—  1  >  k  ,  xm_1  lies  inside  the  ball  Ba  and  by  using  the  definition  of 
cfj  above,  we  have 


1  1 1  v.  1 1  bjc  1 1 2 

K- 1  II2  >  - - -  >  ^1  • 


(8.7) 


From  (8.5)  and  (8.7)  we  have 

b  I  I  «m-l  I  I2  <  II  ^m-l  I  1 2  • 

By  substituting  the  last  inequality  and  (8.7)  into  (8.6),  we  obtain 


Predm_i  >  |  |  sm_!  |  |2  . 


But,  by  Corollary  6.4, 


|  Aredm_i  -  Predn^  |  <  o4  r* 


sm— 1  I  12 


(8.8) 


So, 


|  Aredm_l  -  Predm_x  ^  <  2  a4  b  r,  |  |  sm_!  |  |22 


Pred. 


m—  1 


a\  II  sm-l  I  1 2 


Now  using  (8.4),  we  ootain 
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Aredm_1  —  Predm_x  2  a4  r*  cr 


Predm_1 


I  < 


o  i  a 


1  (  1  -  V2  )  <  (  1  “  V2  )  ■ 


This  implies  that 


Aredm_x 

Predm_x 


>  %  ■ 


Hence  from  the  rule  of  updating  the  radius  of  the  trust  region,  we  have 


A*,-!  <  Am  . 

The  last  inequality  implies  that  k  =  m—  1  satisfies  (8.3).  This  contradicts  the 
supposition  that  m  is  the  smallest  such  index  and  means  that  there  is  no 
m  >  k  such  that  (8.3)  holds.  Hence,  for  all  k  >  fc  ,  we  have 


A*  > 


oq  ox  r* 
a  b 


(  1  —  »?2  ) 


which  contradict  (8.2).  Hence,  eventually  {  xk  }  must  leave  the  ball  Ba  for  some 
k  >  k  .  Let  /  +1  be  the  first  integer  greater  than  k  such  that  xl+l  does  not  lie 
inside  the  ball  Ba  .  Since  xi+x  ^  x £  ,  there  must  exist  at  least  one  acceptable  step 
in  the  set  of  iterates  indexed  {  k,...,l},  so  by  Lemma  6.1, 

~  ^+1  =  E  (  $k  ~  $*+i  )  >  E  mPredk 

k=k  ktS(k,l) 


^  ^  Vi  II H  IU  .  r  A  IUf  M2 

>  >, - mm  Al  ,  - 

-  2  26,  2  b2 


If  A.  < 


I  I  1 2 


2  b, 


for  all  k  e  S(k,l)  ,  then 


n,  1 1  he.  |  1 9 

-  *1+1  >  aL  "  ‘  E  A, 

“  *  keS(k,l) 


> 


m  I  I  hk  1 12 

"2  YVX 
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Otherwise, 


**  -  */+1  > 


m  1 1  H  1 1| 


2  4  6j  b2 


In  either  case 


^  -  */+i  > 


II  H  ||2 

2 

2  bx 

Vi 

II  A*  ll2 

2 

2  bx 

Vi 

II  K  HI 

min  [  a  , 


N2 


mm 


2  b2  J 

I  I  fyfc  I  1 2  I  I  ^/fc  I  1 2 


2/? 


2  6, 


[  1  1 
min  —  ,  — — 

2  4  /?  b2 


(8.9) 


Since  {<!>*.}  is  bounded  below  and  is  a  decreasing  sequence,  {  $k  }  converges  to 
some  limit  4>,  .  Taking  the  limit  as  l  goes  to  infinity  in  inequality  (8.9),  we 
obtain 

A  a  s  1 1  hk  1 122  .  ,  1  1  , 

d>r  -  d>*  > - - - min  '  1 


4  6, 


/?  ’  b< 


If  we  now  take  the  limit  as  k  goes  to  infinity,  we  obtain 

n  ^  ^1  fo  [1  1  1 

“  2  8  bx  1  0  b2  1 

which  contradicts  e0  >  0  .  The  supposition  is  wrong  and  hence  the  lemma  is 
proved.  ■ 


Lemma  8.2 


Let  the  standard  assumptions  hold.  If  each  member  of  the  sequence  of  iterates 
generated  by  the  algorithm  does  not  satisfy  the  termination  condition  (3.1),  then 

lim  inf  1 1  Pk  V/t  |  |2  =  0  . 

£—►00 


Proof 
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The  proof  is  by  contradiction.  Suppose  that  there  exists  an  e0  >  0  and  an 
integer  K  such  that  1 1  Pk  Vf  k  1 |2  >  e0  f°r  k  >  K  . 

Since,  by  using  (6.9), 

||P*(V/*+P*tf)||2  >  ||P*  V/*  ||2  -  65  | \hk  ||2) 

From  Lemma  8.1,  there  exist  A4  sufficiently  large  such  that  for  all  k  >  kk  ,  we 
have 


1 1  hk  |  |2  <  — — —  e0 
i  o5 


Thus  for  k  >  max  [  K  ,  kx 


Pk  (  V4  -I-  Bk  sk  )  1 1 2  >  •—  eo  • 


Now,  since  from  (6.4)  and  Lemma  6.8, 


Predk  >  1  1 1  P*  (  V/*  +  Bk  s£  )  |  |2  min  [  At 


1 1  Pt(V/t  +  £*# )  I  U 
264 


—  (^5  I  I  4  1 12  I  I  fyfc  I  I2)  —  (a5  I  I  4  I  I2  +  a6  1 1  1 12)  I  I  hk  1 12  , 


and  since  ||  hk  |  |2  converges  to  zero  and  ||  sk  ||2  and  ||  sk_tt  |  |2  are 
bounded,  then  there  exists  an  integer  k2  >  max  [  K  ,  kl]  such  that  for  all 
k  >  k2  we  have 


Predk  >  1 1  P*(V/t  4-  Bksk  )  1 12  min  [  y  Ak 

Thus,  for  all  k  >  k2  ,  we  have 


1 1  P*(Vj*  +  -E4 4”  )  1 12 

264 


Predk  >  — - —  min  [  —  Al 

*  -  8  2  1  2 


From  Theorem  5.1  there  exists  an  infinite  sequence  of  successful  iterations.  Now, 
for  any  successful  iteration  indexed  k  >  k2  ,  we  have 
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Aredk  >  ^  Predk  >  —  e0  min  [  Ak  ,  —  }  . 

If  k2  >  max  [  k2  ,  k  }  ,  then  the  last  inequality  and  the  assumption  that  {<£*}  is 
bounded  below  imply  that 


oo  > 


OO 


E  (** 

k—k2 


*t+l 


OO 


Y,  Aredk 
k~k2 


£  eo  min 

=5(*2,oo) 


A* 


This  implies  that 


lim  inf  Ak  =  0  . 

*—►00 

This  means  that  there  exists  an  integer  &3  >  k2  such  that 


(8.10) 


Q!  i  (To 

Ajfc  <  -  (  1  —  ^2  ) 


(8.11) 


is  satisfied  for  some  k  >  k3  ,  where  a  =  max 
be  a  constant  that  satisfies 


32  a  4  r* 


eo 


and  a2  is  defined  to 


<  min  [  1 


a 


(  1  -  V2  )  ’  2  6„ 


Let  m  be  the  first  integer  greater  than  k3  such  that  (8.11)  holds.  This  implies 
that  m  >  fc3+l  .  So,  from  (2.3), 


I  I  —1  I  1 2  < 


«1 


Q1  a2 
aq  a 


(  1  -  V2  ) 


(8.12) 


<  cr2  (  1  -  )  <  °2  < 


Pndm-i  > 


_fo_ 

32 


sm-l 


2  • 


We  obtain 
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So,  by  using  Corollary  6.4,  (8.12),  and  the  last  inequality,  we  get 


Aredm_x  -  Predm_j_  <  32  a4  r,  1 1  sm_:  |  |2 


Predm_l 


eo 


32  a4  r*  <r2 

<  - - - (  1  ~  Vi  ) 

e0  a 


<  cr2  (  1  —  t?2  )  <  (  1  —  ??2  )  . 


The  last  inequality  implies  that 


Aredm_  x 
Pi-edm_x 


>  Vi  ■ 


Hence,  from  the  rule  of  updating  the  radius  of  the  trust  region  in  Scheme  2.2,  we 
obtain 


^m  —  1  5:  ^ m  ■ 

This  implies  that  m—  1  satisfies  (8.11)  which  contradicts  the  assumption  that  m 
is  the  smallest  integer  >  k3  such  that  (8.11)  holds.  Hence,  for  all  k  >  k3  ,  we 
have 


Oi  1  (7o 

\  >  - —  (  1  “  *?2  )  • 

a 

The  last  inequality  contradicts  (8.10).  The  supposition  is  contradicted  and  hence 
the  lemma  is  proved.  ■ 


Now  let  us  again  state  and  then  prove,  our  main  global  convergence  result, 
Theorem  5.2. 

Theorem  5.2 

Under  the  standard  assumptions,  the  algorithm  produces  iterates  {xk}  which 
satisfy 


45 


lip  inf  [  1 1  hk  1 12  +  1 1  Pk  V/t  |  |2  ]  =0 
*—►00 

Proof 

The  proof  follows  immediately  from  Lemma  8.1  and  Lemma  8.2.  ■ 

9.  Concluding  Remarks 

We  have  presented  a  global  convergence  analysis  for  a  variant  of  the  1984 
Celis-Dennis-Tapia  algorithm  in  which  we  use  a  different  scheme  for  updating  the 
penalty  parameter.  This  scheme  ensures  that  the  merit  function  is  decreased  at 
each  iteration  by  at  least  a  fraction  of  Cauchy  decrease.  This  indicates  compati¬ 
bility  with  the  choice  of  9k  in  the  CDT  subproblem. 

To  force  global  convergence,  we  have  employed,  as  a  merit  function,  the  aug¬ 
mented  Lagrangian  which  is  naturally  compatible  with  the  subproblem.  For  more 
details,  see  Celis,  Dennis,  Martinez,  Tapia,  and  Williamson  (1989). 

Schittkowski  (1983),  Gill,  Murray,  Saunders,  and  Wright  (1986)  and  Powell 
and  Yuan  (1986-a  and  1986-b)  have  also  considered  this  function  as  a  merit  func¬ 
tion. 

Powell  and  Yuan  (1986-a  and  1986-b)  used  the  least-squares  multiplier  esti¬ 
mate  to  update  the  estimate  of  the  multiplier  X  ,  and  hence  they  treated  it  as  a 
function  of  x  rather  than  a  separate  variable.  They  proved  several  global  and 
local  convergence  properties  using  this  merit  function.  We  prefer  our  way  of 
updating  the  multiplier  X  for  several  reasons  including  the  fact  that  it  is  less 
expensive  to  calculate  than  the  Powell  and  Yuan’s  choice. 

For  future  work,  there  are  many  questions  that  need  to  be  answered. 

Although  intensive  numerical  investigation  with  the  CDT  algorithm  was 
reported  by  Celis,  Dennis  and  Tapia  (1985),  Celis  (1985)  and  Celis,  Dennis,  Mar- 
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tinez,  Tapia,  and  Williamson  (1989),  we  believe  that  the  implementation  of  the 
algorithm  must  be  refined.  In  particular,  an  efficient  algorithm  for  solving  the 
CDT  subproblem  is  needed.  This  will  require  a  closer  look  at  the  CDT  subprob¬ 
lem  and  the  characteristics  of  its  solution.  Currently,  this  is  a  topic  of 
research,  e.g.  Yuan  (1987)  and  Zhang  (1988),  but  the  problem  has  not  been 
solved. 

A  related  important  question  is  how  to  use  a  secant  approximation  of  the 
Hessian  of  the  Lagrangian  in  order  to  produce  a  more  efficient  algorithm.  We 
believe  that  Tapia  (1988)  will  be  of  considerable  value  here. 

Another  important  topic  that  we  expect  to  consider  is  how  to  incorporate  ine¬ 
quality  constraints  into  the  formulation  of  the  algorithm. 
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